Gaming Data Overview and Backgrond Knowledge
1.Brief description and data soure
The gaming data set has information of video games with sales greater than 100,000 copies since the year 1980 to 2016. This gaming data set is downloaded from Kaggle.com.
Mobile, PC/Mac(computer), social/online, and console are common gaming platforms. Console here refers to a computer device that outputs a video signal or visual image to display a video game that one or more people can play. Some popular consoles are PlayStation 4 Pro, Xbox One X, Nintendo Switch. Our dataset is about videogames played on console only.
2.Data Cleaning
- We \(\color{red}{\text{removed}}\) all NULL values, Unknown values, and data in 2017 or later.
- We \(\color{red}{\text{converted}}\) Year from String to Numeric.
- We \(\color{red}{\text{added}}\) a new column called Portable based on if the gaming console is portable or not.
- We have 16187 rows and 12 variables after cleaning the data and adding the Portable variable.
options(stringsAsFactors = FALSE)
Portable <- function(df) {
len <- length(df$Platform)
new_vec <- vector(mode = "numeric", length = len)
protvec <- c("DS", "GB", "3DS", "GBA")
for (i in 1:len) {
if (df$Platform[i] %in% protvec) {
new_vec[i] <- 1
} else {
new_vec[i] <- 0
}
}
return(new_vec)
}
3.Variables
| Rank |
Ranking of overall sales |
| Name |
The games name |
| Platform |
Platform of the games release (i.e. PC,PS4, etc.) |
| Year |
Year of the game’s release |
| Genre |
Genre - Genre of the game |
| Publisher |
Publisher of the game |
| NA_Sales |
Sales in North America (in millions) |
| EU_Sales |
Sales in Europe (in millions) |
| JP_Sales |
Sales in Japan (in millions) |
| Other_Sales |
Sales in the rest of the world (in millions) |
| Global_Sales |
Total worldwide sales |
| Portable |
If the gaming console is portable (1=yes,0=no) |
4.Glimpse of Data
a.
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 16187 obs. of 12 variables:
$ Rank : num 3825 1679 1879 1711 638 ...
$ Name : chr "Seaman" "NFL 2K" "NFL 2K1" "Shenmue" ...
$ Platform : chr "DC" "DC" "DC" "DC" ...
$ Year : num 1999 1999 2000 1999 1998 ...
$ Genre : chr "Simulation" "Sports" "Sports" "Adventure" ...
$ Publisher : chr "Sega" "Sega" "Sega" "Sega" ...
$ NA_Sales : num 0 1.12 1.02 0.52 1.26 1.1 0.41 0 0 0 ...
$ EU_Sales : num 0 0.05 0.05 0.24 0.61 0.51 0.23 0 0 0 ...
$ JP_Sales : num 0.52 0 0 0.38 0.46 0.12 0.47 1.01 0.54 0.62 ...
$ Other_Sales : num 0 0.02 0.02 0.04 0.08 0.08 0.03 0 0 0 ...
$ GLobal_Sales: num 0.52 1.2 1.09 1.18 2.42 1.81 1.14 1.01 0.54 0.62 ...
$ Portable : num 0 0 0 0 0 0 0 1 1 1 ...
c. 12 unique genres:
Simulation, Sports, Adventure, Platform, Racing, Action, Misc, Role-Playing, Puzzle, Fighting, Strategy, Shooter
d. 575 unique publishers:
Top 10 Publishers based on frequency: Electronic Arts, Activision, Namco Bandai Games, Ubisoft, Konami Digital Entertainment, THQ, Nintendo, Sony Computer Entertainment, Sega, Take-Two Interactive
1.Global trends and analysis
1.b Top 10 best selling publishers over time?
- Top 3 publishers are Nintendo, EA, and Activision.
top10 <- games %>%select(Publisher,GLobal_Sales)%>%group_by(Publisher)%>%summarise(GLobal_Sales=sum(GLobal_Sales))%>%arrange(desc(GLobal_Sales))%>%head(10)
p <- ggplot(top10, aes(x=reorder(Publisher,-GLobal_Sales),y=GLobal_Sales,,label=round(GLobal_Sales,2))) +
stat_summary(fun.y=sum, geom="bar",position=position_dodge(1),width=0.8,show.legend = F,,col="black",fill="skyblue")+
labs(title = "Top 10 best selling publishers",caption = "Sales in Million")+
geom_text(col="black",size=4,vjust=-1)+
ylab("Global Sales")+xlab("Publisher")+
scale_x_discrete(labels = function(x) str_wrap(x, width =1))
p

1.d Best selling genre for the TOP 3 publishers
- Top 3 genres for each publisher:
- Nintendo: Sports, Role-playing, Platform
- EA: sports, shooter, racing
- Activision: sports, shooter, action
top_three_publisher <- subset(games,Publisher %in% c("Nintendo","Electronic Arts","Activision"))
top_three_publisher$Portable <- factor(top_three_publisher$Portable)
top_three_publisher_platform <- top_three_publisher %>%
select(Publisher,Genre,Portable,GLobal_Sales)%>%
group_by(Publisher,Genre,Portable)%>%
summarise(GLobal_Sales=sum(GLobal_Sales))%>%
arrange(desc(GLobal_Sales))
top_new <- top_three_publisher_platform%>%group_by(Publisher)%>%top_n(3)
top_new$Publisher_g = factor(top_new$Publisher, levels=c("Nintendo","Electronic Arts","Activision"))
ggplot(top_new, aes(x=Genre, y=GLobal_Sales, color=Publisher,shape=Portable)) +
geom_point(size=3)+
scale_color_discrete(breaks=c("Nintendo","Electronic Arts","Activision"))+
labs(title="The top 3 genre for the top 3 publishers",caption="Sales in Million") +
geom_segment(aes(x=Genre,xend=Genre, y=0, yend=GLobal_Sales))+
geom_text(aes(label=GLobal_Sales), hjust = -0.3, size = 2.6,fontface = "bold",color='black') +
theme( plot.title = element_text(size=17,hjust=-0.5)) +
facet_wrap(~ Publisher_g, nrow = 5, scales = 'free', strip.position = 'right')+
ylim(0, max(top_new$GLobal_Sales + 10))+ylab("Global Sales")+
coord_flip()

2. Regional trends and analysis
2.a Top 3 best selling platform by regions
- We studied the regional trends by looking at the top platforms for each region as shown from the bar chart.
games <- games[!(games$Year %in% c("N/A", "2017", "2020")),]
games <- games %>% gather(Region, Revenue, 7:10)
games$Region <- factor(games$Region)
mytheme_1 <- function() {
return(theme(axis.text.x = element_text(angle = 90, size = 10, vjust = 0.4), plot.title = element_text(size = 15, vjust = 2),axis.title.x = element_text(size = 12, vjust = -0.35)))
}
mytheme_2 <- function() {
return(theme(axis.text.x = element_text(size = 10, vjust = 0.4), plot.title = element_text(size = 15, vjust = 2),axis.title.x = element_text(size = 12, vjust = -0.35)))
}
mycolors <- c("#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC","#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC","#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC","#8E9CA3")
top_platform_region <- games %>%
group_by(Region, Platform) %>%
summarize(Revenue = sum(Revenue)) %>%
arrange(desc(Revenue)) %>%
top_n(3)
Selecting by Revenue
ted <- ggplot(top_platform_region, aes(Region, Revenue, fill = Platform)) +
geom_bar(position = "dodge", stat = "identity") +
ggtitle("Top 3 best selling platform by regions") +
ylab("Revenue in Millions") +
xlab("Region") +
mytheme_2() +
theme(legend.position = "top") +
scale_fill_manual(values = c("#8E9CA3","#F8A31B", "#AA3929", "#E25033", "#E2C59F", "#556670"))
ggplotly(ted)
2.b Top 3 best selling publisher by regions
- We studied the regional trends by looking at the top 3 best selling publishers for each region.
top_genres_region <- games %>%
group_by(Region, Publisher) %>%
summarize(Revenue = sum(Revenue)) %>%
arrange(desc(Revenue)) %>%
top_n(3)
ted2 <- ggplot(top_genres_region, aes(Region, Revenue, fill = Publisher)) +
geom_bar(position = "dodge", stat = "identity") +
ggtitle("Top 3 best selling publisher by region") +
ylab("Sales in Millions") +
xlab("Region") +
mytheme_2() +
theme(legend.position = "top")
ggplotly(ted2)
2.c Best selling genre for particular regions
*This heat map identfies the top selling genres for each region by displaying a deeper color (purple) for genres with high revenues.
year_genre <- games %>%
group_by(Year, Genre, Region) %>%
summarise(TotalRevenue = sum(Revenue))
ggplot(year_genre, aes(Year, Genre, fill = TotalRevenue)) +
geom_tile(color = "white") +
ggtitle(" Best selling genre for particular regions") +
facet_wrap(vars(Region), ncol = 4) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
scale_color_gradient(low="pink", high= "purple")+
scale_fill_gradient(low="pink", high= "purple")

2.d Top 3 best selling genre by region
- We studied the regional trends by looking at the top 3 best selling genres for each region.
top_genres_region <- games %>%
group_by(Region, Genre) %>%
summarize(Revenue = sum(Revenue)) %>%
arrange(desc(Revenue)) %>%
top_n(3)
ted2 <- ggplot(top_genres_region, aes(Region, Revenue, fill = Genre)) +
geom_bar(position = "dodge", stat = "identity") +
ggtitle("Top 3 best selling genre by region") +
ylab("Revenue in Millions") +
xlab("Region") +
mytheme_2() +
theme(legend.position = "top")
ggplotly(ted2)
3 Investment options and recommendations
3.a Global sales proportion by region
- For the most part we see that the North America accounts for the highest proportion of global sales. We can also see that European sales are on an incline and actually surpass North american sales by the year 2015-2016. Even though Japan’s proportion of Global sales seem to be declining in the past years, for the most recent years it seems to be steadily inclining.
games <- read_csv("games.csv")
df_trial <- data_frame(sort(games$Year), NA_Sales = games$NA_Sales, games$EU_Sales,
games$JP_Sales, games$Other_Sales, games$GLobal_Sales)
games3 <- games %>%
select(Year, NA_Sales, EU_Sales, JP_Sales, Other_Sales,GLobal_Sales) %>%
group_by(Year) %>%
summarise(NA_sales_prop = sum(NA_Sales)/sum(GLobal_Sales),
EU_sales_prop = sum(EU_Sales)/sum(GLobal_Sales),
JP_sales_prop = sum(JP_Sales)/sum(GLobal_Sales),
Other_sales_prop = sum(Other_Sales)/sum(GLobal_Sales),
Global_sales_prop = sum(GLobal_Sales)/sum(GLobal_Sales))
mycolors <- c("#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC")
regions <- c( "darkgreen" = "North America", "blue" = "Europe" , "sienna" = "Japan" , "orange" = "Other Regions", "black" = "Global")
ggplot(games3,aes(x=Year))+
geom_line(aes(y = NA_sales_prop ,color = "#771C19"))+
geom_line(aes(y = EU_sales_prop , color = "#B6C5CC"))+
geom_line(aes(y = JP_sales_prop, color = "#E25033" ))+
geom_line(aes(y = Other_sales_prop, color = "#E2C59F"))+
geom_line(aes(y = Global_sales_prop, color = "black"))+
labs(title = "Sales per region from 1980-2016", y = "Percentage of global sales")+
scale_color_manual(name = "Regions", values = c( "darkgreen", "blue", "sienna" , "orange", "black"),labels=c("North America", "Europe" , "Japan" , "Other Regions", "Global"))

3.b Summary
| Publisher |
Nintendo |
Nintendo |
Nintendo |
| Platform |
X360 |
PS3 |
DS |
| Genre |
Action |
Action |
Role-Playing |
Appendix
Data Exploration.
ggplot(games, aes(x=Year, fill=..count..)) +
geom_bar()+
scale_color_gradient(low="#771C19", high= "#F27314")+
scale_fill_gradient(low="#771C19", high= "#F27314")+
labs(title="Number of Games Released every Year", x= "Year",
y= "Total Number of Games")+
geom_text(stat='count',aes(label=..count..), hjust=-0.1,color="black", size=2.5)+
scale_x_continuous(breaks = 1980:2016) + theme_minimal()+
coord_flip()

Data Exploration 2

Data Exploration 3
ok <- games %>% select(Year,GLobal_Sales,Genre)%>%group_by(Year,Genre)%>%
summarise(Total_sales=sum(GLobal_Sales))
ok1 <- arrange(ok, desc(Year))
plot_ly(ok1, x = ~Total_sales, y = ~Genre, z = ~Year) %>% layout( title ="Sales by genre from 1980 - 2016") %>%
add_markers(color = ~Genre, size = 0.5)
---
title: "Team 5 Gaming Data Notebook"
subtitle: "![](game2.png){width=250}"
output: html_notebook
---
**Team Members: Fucheng Yao, Huaiping Wang, Limei Huang, Eman Nagib, Kwangwoo Kim** 

```{r include=FALSE}
# loading libraries
library(DT)
library(RColorBrewer)
library(tidyverse)
library(ggthemes)
library(plotly)
library(readxl)
library(ggplot2)
library(dplyr)
library(tidyr)
library(wesanderson)
games <- read_csv("games.csv") 
mycolors <- c("#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC")

# modifying chart size
options(repr.plot.width=250, repr.plot.height=100)
```


### Gaming Data Overview and Backgrond Knowledge
##### 1.Brief description and data soure
The gaming data set has information of video games with sales greater than 100,000 copies since the year 1980 to 2016. This [gaming data set](https://www.kaggle.com/gregorut/videogamesales) is downloaded from [Kaggle.com](www.kaggle.com).

Mobile, PC/Mac(computer), social/online, and console are common gaming platforms. Console here refers to a computer device that outputs a video signal or visual image to display a video game that one or more people can play. Some popular consoles are PlayStation 4 Pro, Xbox One X, Nintendo Switch. Our dataset is about videogames played on console only.  

---

##### 2.Data Cleaning 
* We $\color{red}{\text{removed}}$ all **NULL** values, **Unknown** values, and data in **2017 or later**.
* We $\color{red}{\text{converted}}$ Year from **String** to **Numeric**. 
* We $\color{red}{\text{added}}$ a new column called *Portable* based on if the gaming console is portable or not.
* We have `r nrow(games)` rows and `r ncol(games)` variables after cleaning the data and adding the *Portable* variable. 

```{r echo=TRUE}
options(stringsAsFactors = FALSE)
Portable <- function(df) {
 len <- length(df$Platform)
 new_vec <- vector(mode = "numeric", length = len)
 protvec <- c("DS", "GB", "3DS", "GBA")
 for (i in 1:len)  {
   if (df$Platform[i] %in% protvec) {
     new_vec[i] <- 1
   } else {
     new_vec[i] <- 0
   }
 }
 return(new_vec)
}

```

```{r}
games$Portable <- Portable(games)
```

---

##### 3.Variables 
| Variable   |      Description      |
|----------|:-------------:|
| Rank |  Ranking of overall sales | 
| Name |    The games name   |   
| Platform | Platform of the games release (i.e. PC,PS4, etc.) |    
| Year | Year of the game's release |
| Genre | Genre - Genre of the game |
| Publisher |Publisher of the game |
|NA_Sales|Sales in North America (in millions) | 
| EU_Sales| Sales in Europe (in millions) |
|JP_Sales |Sales in Japan (in millions) |
| Other_Sales|Sales in the rest of the world (in millions) |
|Global_Sales|Total worldwide sales |
| Portable | If the gaming console is portable (1=yes,0=no) |

---

##### 4.Glimpse of Data 
###### a.
```{r}
knitr::opts_chunk$set(
  echo = FALSE
)
str(games,give.attr=F)
```


###### b. **`r length(unique(games$Platform))` unique platforms:**    
`r unique(games$Platform) `

```{r eval=FALSE, include=FALSE}
unique(games$Platform)
```


###### c. **`r length(unique(games$Genre))` unique genres:**   
`r unique(games$Genre) `


###### d. **`r length(unique(games$Publisher))` unique publishers:** 
Top 10 Publishers based on frequency: `r top_ten(games)`

```{r include=FALSE}
top_ten <- function(x) {
  df_temp <- count(x, Publisher, sort = TRUE)
  return(df_temp$Publisher[1:10])
}
```


---

### Problem Set:  Where and how to invest in the gaming industry 
*  Global trends and anlysis 
*  Regional trends and analysis 
*  Investment options and recommendations 


---
### 1.Global trends and analysis 
######  1.a  What is the top platform each year? 
* Once a platform clicks in the market, it goes on to rule for a few years.
* The Playstation platform was popular for nearly 20 years.

```{r echo=TRUE,results=FALSE}

top_platforms <- games %>%
             group_by(Year, Platform) %>%
             summarize(Revenue = sum(GLobal_Sales)) %>%
             arrange(desc(Revenue)) %>%
             top_n(1)
ggplot(top_platforms, aes(Year, Revenue, fill = Platform)) + 
  geom_bar(stat = "identity") +
  ggtitle("Top Platform by Revenue each year") +
  theme(legend.position = "top") + 
  scale_fill_manual(values = mycolors)+
  scale_x_continuous(breaks = seq(min(top_platforms$Year),max(top_platforms$Year),5))+
  theme(axis.text.x = element_text(angle = 90))
```


######  1.b  Top 10 best selling publishers over time? 
* Top 3 publishers are Nintendo, EA, and Activision.

```{r echo=TRUE,fig.width=12}

top10 <- games %>%select(Publisher,GLobal_Sales)%>%group_by(Publisher)%>%summarise(GLobal_Sales=sum(GLobal_Sales))%>%arrange(desc(GLobal_Sales))%>%head(10)


p <- ggplot(top10, aes(x=reorder(Publisher,-GLobal_Sales),y=GLobal_Sales,,label=round(GLobal_Sales,2))) +
 stat_summary(fun.y=sum, geom="bar",position=position_dodge(1),width=0.8,show.legend = F,,col="black",fill="skyblue")+
labs(title = "Top 10 best selling publishers",caption = "Sales in Million")+
   geom_text(col="black",size=4,vjust=-1)+
  ylab("Global Sales")+xlab("Publisher")+
  scale_x_discrete(labels = function(x) str_wrap(x, width =1))

p
```

###### 1.c  Best selling platforms for the TOP 3 publishers
* Top 3 platforms for each publisher:
* Nintendo: Wii, GB, DS
* EA: X360, PS3, PS2
* Activision: X360, PS3, PS2

```{r echo=TRUE, results=FALSE,fig.width=7}
top_three_publisher <- subset(games,Publisher %in% c("Nintendo","Electronic Arts","Activision"))
top_three_publisher$Portable <- factor(top_three_publisher$Portable)
top_three_publisher_platform <- top_three_publisher %>%select(Publisher,Platform,Portable,GLobal_Sales)%>%group_by(Publisher,Platform,Portable)%>%summarise(GLobal_Sales=sum(GLobal_Sales))%>%arrange(desc(Publisher))
top_new <- top_three_publisher_platform%>%group_by(Publisher)%>%top_n(3)
top_new$Publisher_f = factor(top_new$Publisher, levels=c("Nintendo","Electronic Arts","Activision"))
ggplot(top_new, aes(x=Platform, y=GLobal_Sales, color=Publisher,shape=Portable)) +
 geom_point(size=3)+
 scale_color_discrete(breaks=c("Nintendo","Electronic Arts","Activision")) +
 labs(title="     Top 3 platforms for the top 3 publishers",caption="Sales in Million") +
 geom_segment(aes(x=Platform,xend=Platform, y=0, yend=GLobal_Sales))+
 geom_text(aes(label=GLobal_Sales), hjust = -0.3, size = 2.6,fontface = "bold",color='black') +
 theme( plot.title = element_text(size=17,hjust=-0.4)) +
   facet_wrap(~ Publisher_f, nrow = 5, scales = 'free', strip.position = 'right')+
 ylim(0, max(top_new$GLobal_Sales + 10))+ylab("Global Sales")+
 coord_flip()
```

###### 1.d  Best selling genre for the TOP 3 publishers
* Top 3 genres for each publisher:
* Nintendo: Sports, Role-playing, Platform
* EA: sports, shooter, racing
* Activision: sports, shooter, action

```{r echo=TRUE, results=FALSE}
top_three_publisher <- subset(games,Publisher %in% c("Nintendo","Electronic Arts","Activision"))
top_three_publisher$Portable <- factor(top_three_publisher$Portable)
top_three_publisher_platform <- top_three_publisher %>%
select(Publisher,Genre,Portable,GLobal_Sales)%>%
group_by(Publisher,Genre,Portable)%>%
summarise(GLobal_Sales=sum(GLobal_Sales))%>%
arrange(desc(GLobal_Sales))
top_new <- top_three_publisher_platform%>%group_by(Publisher)%>%top_n(3)
top_new$Publisher_g = factor(top_new$Publisher, levels=c("Nintendo","Electronic Arts","Activision"))
ggplot(top_new, aes(x=Genre, y=GLobal_Sales, color=Publisher,shape=Portable)) +
  geom_point(size=3)+
scale_color_discrete(breaks=c("Nintendo","Electronic Arts","Activision"))+
labs(title="The top 3 genre for the top 3 publishers",caption="Sales in Million") +
geom_segment(aes(x=Genre,xend=Genre, y=0, yend=GLobal_Sales))+
geom_text(aes(label=GLobal_Sales), hjust = -0.3, size = 2.6,fontface = "bold",color='black') +
theme( plot.title = element_text(size=17,hjust=-0.5)) +
  facet_wrap(~ Publisher_g, nrow = 5, scales = 'free', strip.position = 'right')+
ylim(0, max(top_new$GLobal_Sales + 10))+ylab("Global Sales")+
coord_flip()
```

###### 1.e  Genre frequency over the years and Platform popularity over the years
* Here, the popularity is defined as the number of games published on each platform. In the early 1980s, there were limited amount of platforms in the market (2600, NES), but later on, more and more platforms emerged. It is also interesting to see that different platform tends to dominate the market in different periods, and there is a trend to shift from less portable to portable ones. 

| Years          |      Most Popular Platform  | Most Profitable Platform |   
|----------------|:---------:| :----------:|   
| 1980 - 1982 |  2600  |  2600 |
|1983| 2600 | NES|
| 1984 - 1988 |  NES   |  NES  |
| 1989        |  GB    |  GB   |
| 1990        |  NES   |  SNES |
| 1991 - 1994 |  SNES  |  SNES |
| 1995 - 2000 |  PS    |  PS   | 
| 2001 - 2005 |  PS2   |  PS2  |
| 2006        |  PS2   |  Will |
| 2007 - 2010 |  DS    |  Will | 
| 2011 - 2014 |  PS3   |  PS3  |
| 2015 - 2016 |  PS4   |  PS4  |

```{r}
games %>%  group_by(Year, Genre) %>%
 summarise(Count = n()) %>% arrange(Year, desc(Count)) %>%
 spread(key = Genre, value = Count, fill = 0) %>%
 gather(2:13, key = "Genre", value = "Count" ) %>%
 arrange(Year, desc(Count)) -> GenrePop
 ggplot(GenrePop, aes(x = Genre, y = Count)) +
          geom_point(aes(frame = Year, color = Genre, size = Count)) +  theme(axis.text.x = element_text(angle=65, vjust=0.5), legend.position = "none")+
 scale_y_continuous(breaks = seq(0,max(GenrePop$Count),50))+
   ggtitle("Genre frequency over the years")-> ppn3
ggplotly(ppn3)
```


```{r fig.height=8, warning=FALSE}
games %>%  group_by(Year, Platform) %>%
 summarise(count = n()) %>% arrange(Year, desc(count)) %>%
 spread(key = Platform, value = count, fill = 0) %>%
 gather(2:32, key = "Platform", value = "count" ) %>%
 arrange(Year, desc(count)) -> PlatformPop
 ggplot(PlatformPop, aes(x = Platform, y = count)) +
          geom_point(aes(frame = Year, color = Platform, size = count)) +  theme(axis.text.x = element_text(angle=65, vjust=0.5), legend.position = "none")+
 scale_y_continuous(breaks = seq(0,max(PlatformPop$count),50))+
  ggtitle("Platform popularity over the years") -> ppn2
ggplotly(ppn2)
```


### 2. Regional trends and analysis 
###### 2.a Top 3 best selling platform by regions
* We studied the regional trends by looking at the top platforms for each region as shown from the bar chart.
```{r echo=T}
games <- games[!(games$Year %in% c("N/A", "2017", "2020")),]
games <- games %>% gather(Region, Revenue, 7:10) 
games$Region <- factor(games$Region)

mytheme_1 <- function() {
  
 return(theme(axis.text.x = element_text(angle = 90, size = 10, vjust = 0.4), plot.title = element_text(size = 15, vjust = 2),axis.title.x = element_text(size = 12, vjust = -0.35)))
  
}

mytheme_2 <- function() {
  
 return(theme(axis.text.x = element_text(size = 10, vjust = 0.4), plot.title = element_text(size = 15, vjust = 2),axis.title.x = element_text(size = 12, vjust = -0.35)))
  
}

mycolors <- c("#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC","#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC","#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC","#8E9CA3")


top_platform_region <- games %>%
             group_by(Region, Platform) %>%
             summarize(Revenue = sum(Revenue)) %>%
             arrange(desc(Revenue)) %>%
             top_n(3)

ted <- ggplot(top_platform_region, aes(Region, Revenue, fill = Platform)) + 
  geom_bar(position = "dodge", stat = "identity")  +
  ggtitle("Top 3 best selling platform by regions") +
  ylab("Revenue in Millions") +
  xlab("Region") +
  mytheme_2() +
  theme(legend.position = "top") + 
  scale_fill_manual(values = c("#8E9CA3","#F8A31B", "#AA3929", "#E25033", "#E2C59F", "#556670"))
ggplotly(ted)
```

###### 2.b Top 3 best selling publisher by regions
* We studied the regional trends by looking at the top 3 best selling publishers for each region.
```{r echo=TRUE,results=F,fig.width=10}
top_genres_region <- games %>%
             group_by(Region, Publisher) %>%
             summarize(Revenue = sum(Revenue)) %>%
             arrange(desc(Revenue)) %>%
             top_n(3)

ted2 <- ggplot(top_genres_region, aes(Region, Revenue, fill = Publisher)) + 
  geom_bar(position = "dodge", stat = "identity")  +
  ggtitle("Top 3 best selling publisher by region") +
  ylab("Sales in Millions") +
  xlab("Region") +
  mytheme_2() +
  theme(legend.position = "top")

ggplotly(ted2)
```

###### 2.c Best selling genre for particular regions 
*This heat map identfies the top selling genres for each region by displaying a deeper color (purple) for genres with high revenues.
```{r echo=TRUE,fig.width=10,fig.height=5,results=F}
year_genre <- games %>% 
                group_by(Year, Genre, Region) %>% 
                  summarise(TotalRevenue = sum(Revenue)) 
                

ggplot(year_genre, aes(Year, Genre, fill = TotalRevenue)) +
    geom_tile(color = "white") +
    ggtitle("                       Best selling genre for particular regions") + 
    facet_wrap(vars(Region), ncol = 4) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  scale_color_gradient(low="pink", high= "purple")+
    scale_fill_gradient(low="pink", high= "purple")
```

###### 2.d Top 3 best selling genre by region
* We studied the regional trends by looking at the top 3 best selling genres for each region.
```{r echo=TRUE,results=F,fig.width=8}
top_genres_region <- games %>%
             group_by(Region, Genre) %>%
             summarize(Revenue = sum(Revenue)) %>%
             arrange(desc(Revenue)) %>%
             top_n(3)

ted2 <- ggplot(top_genres_region, aes(Region, Revenue, fill = Genre)) + 
  geom_bar(position = "dodge", stat = "identity")  +
  ggtitle("Top 3 best selling genre by region") +
  ylab("Revenue in Millions") +
  xlab("Region") +
  mytheme_2() +
  theme(legend.position = "top")

ggplotly(ted2)
```


### 3 Investment options and recommendations 
###### 3.a Global sales proportion by region 
* For the most part we see that the North America accounts for the highest proportion of global sales. We can also see that European sales are on an incline and actually surpass North american sales by the year 2015-2016. Even though Japan's proportion of Global sales seem to be declining in the past years, for the most recent years it seems to be steadily inclining. 
```{r echo=T,results=F}
games <- read_csv("games.csv") 
df_trial <- data_frame(sort(games$Year), NA_Sales = games$NA_Sales, games$EU_Sales,
                      games$JP_Sales, games$Other_Sales, games$GLobal_Sales)
games3 <- games %>%
 select(Year, NA_Sales, EU_Sales, JP_Sales, Other_Sales,GLobal_Sales) %>%
 group_by(Year) %>%
 summarise(NA_sales_prop = sum(NA_Sales)/sum(GLobal_Sales),
           EU_sales_prop = sum(EU_Sales)/sum(GLobal_Sales),
           JP_sales_prop = sum(JP_Sales)/sum(GLobal_Sales),
           Other_sales_prop = sum(Other_Sales)/sum(GLobal_Sales),
           Global_sales_prop = sum(GLobal_Sales)/sum(GLobal_Sales))
mycolors <- c("#771C19", "#AA3929", "#8E9CA3", "#556670", "#000000", "#E25033", "#F27314", "#F8A31B", "#E2C59F", "#B6C5CC")
regions <- c( "darkgreen" = "North America", "blue" = "Europe" ,  "sienna" = "Japan" , "orange" = "Other Regions", "black" = "Global")

ggplot(games3,aes(x=Year))+
  geom_line(aes(y = NA_sales_prop ,color = "#771C19"))+
  geom_line(aes(y = EU_sales_prop , color = "#B6C5CC"))+
  geom_line(aes(y = JP_sales_prop, color = "#E25033" ))+
  geom_line(aes(y = Other_sales_prop, color = "#E2C59F"))+
  geom_line(aes(y = Global_sales_prop, color = "black"))+
  labs(title = "Sales per region from 1980-2016", y = "Percentage of global sales")+
  scale_color_manual(name = "Regions", values = c( "darkgreen", "blue", "sienna" , "orange", "black"),labels=c("North America", "Europe" , "Japan" , "Other Regions", "Global"))
```

###### 3.b Summary 

| Option |     North America  | Europe | Japan |
|----------------|:---------:| :----------:|:-------------|
|Publisher| Nintendo | Nintendo |Nintendo |
|Platform | X360 | PS3 | DS |
|Genre   | Action | Action | Role-Playing|



### Appendix

###### Data Exploration. 
```{r echo=TRUE,fig.height=8}
ggplot(games, aes(x=Year, fill=..count..)) +
    geom_bar()+
    scale_color_gradient(low="#771C19", high= "#F27314")+
    scale_fill_gradient(low="#771C19", high= "#F27314")+
    labs(title="Number of Games Released every Year", x= "Year", 
         y= "Total Number of Games")+
    geom_text(stat='count',aes(label=..count..), hjust=-0.1,color="black", size=2.5)+
    scale_x_continuous(breaks = 1980:2016) + theme_minimal()+
  coord_flip()
```

###### Data Exploration 2 
```{r}
options(stringsAsFactors = FALSE)
Portable <- function(df) {
 len <- length(df$Platform)
 new_vec <- vector(mode = "numeric", length = len)
 protvec <- c("DS", "GB", "3DS", "GBA")
 for (i in 1:len)  {
   if (df$Platform[i] %in% protvec) {
     new_vec[i] <- 1
   } else {
     new_vec[i] <- 0
   }
 }
 return(new_vec)
}
games$Portable <- Portable(games)

games2<- games

games2 %>%
 transmute(Year2008 = cut(Year, breaks = c(1980, 2008, 2017),
                          labels = c("Before 2008", "After 2008"),
                          include.lowest = TRUE, right = FALSE),
           Portable = factor(Portable),
           Sales = GLobal_Sales) %>%
 group_by(Year2008, Portable) %>%
 summarise(count = n()) -> games3

ggplot(games3) +
 geom_bar(aes(x = Portable , y = count, fill = Year2008),
                position = "dodge", stat = "identity") +
 labs(subtitle="Portable Vs. Non-Portable", x= "Type",
        y= "Total Number of Games Released") +
 theme(plot.title = element_text(hjust = 0.5)) 
```

###### Data Exploration 3 
```{r echo=TRUE,fig.width=10,fig.height=5,results=F, fig.width= 10}
ok <- games %>% select(Year,GLobal_Sales,Genre)%>%group_by(Year,Genre)%>%
  summarise(Total_sales=sum(GLobal_Sales)) 
ok1 <- arrange(ok, desc(Year))

plot_ly(ok1, x = ~Total_sales, y = ~Genre, z = ~Year) %>% layout( title ="Sales by genre from 1980 - 2016") %>%
  add_markers(color = ~Genre, size = 0.5)

```








